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ABSTRACT 

Map-making presents a significant computational challenge to the next generation of kilopixel 
CMB polarisation experiments. Years worth of time ordered data (TOD) from thousands of 
detectors will need to be compressed into maps of the T, Q and U Stokes parameters. Fun- 
damental to the science goal of these experiments, the observation of B-modes, is the ability 
to control noise and systematics. In this paper, we consider an alternative to the maximum- 
likelihood method, called destriping, where the noise is modelled as a set of discrete offset 
functions and then subtracted from the time-stream. We compare our destriping code (Descart: 
the DEStriping CARTographer) to a full maximum-likelihood map-maker, applying them to 
200 Monte-Carlo simulations of time-ordered data from a ground based, partial-sky polarisa- 
tion modulation experiment. In these simulations, the noise is dominated by either detector or 
atmospheric 1// noise. Using prior information of the power spectrum of this noise, we pro- 
duce destriped maps of T, Q and U which are negligibly different from optimal. The method 
does not filter the signal or bias the E or B-mode power spectra. Depending on the length of 
the destriping baseline, the method delivers between 5 and 22 times improvement in computa- 
tion time over the maximum-likelihood algorithm. We find that, for the specific case of single 
detector maps, it is essential to destripe the atmospheric 1// in order to detect B-modes, even 
though the Q and U signals are modulated by a half-wave plate spinning at 5-Hz. 

Key words: cosmic microwave background - methods: data analysis - methods: statistical 



1 INTRODUCTION 



Measurements of the temperature and polarisation anisotropy of 
the cosmic microwave background (CMB) have been used to 
const rain a number of c osmological parame t ers to high preci- 
' sion JSmoot et al.l Il99i iHananv et all l200d iMasi et al. I I2OO6L 
iHinshaw et aPb OOS'). 

There are, however, still open questions that haven't been an- 
swered with the new results. In particular there is still no evidence 
for or against the existence of primordial gravity waves left over 
from a period of inflation at very early times. A bath of gravita- 
tional radiation should leave its imprint on the polarisation of the 
CMB in the form of a unique B-mode pattern. Such a signal will 
be faint compared to the polarisation that arises from density per- 
turbations. 

If we are to make a convincing measurement of B-modes (or 
rule out their existence), we need to construct an experiment with 
far greater sensitivity than the current state of the art. Given the 
physical limitations on the properties of individual detectors, the 
only realistic way of doing this is by observing the sky with mul- 
tiple detectors for long periods of time. Most experiments that are 
being proposed have these characteristics. 



^A notable example is the CLOVER experiment jNorth et al.l 

120081) . This experiment will have 576 detectors and will focus on 
small fractions of the sky for periods of up to two years. With cur- 
rent planned technology, it is hoped that the noise per map pixel 
in the experiment will be below 0.8-/iK. To achieve this, it is es- 
sential to supplement the increased number of observations that 
multiple detector pixels bring by removing correlated noise in the 
map-making stage. 

Solutions to the map-making problem aim to compress ter- 
abytes of time ordered data (TOD), acquired over months or years 
of scanning the sky, into a pixelised sky map, with noise uncorre- 
lated between sky pixels. In each time ordered datum, the signal 
is typically swamped by noise, necessitating the process of bin- 
ning and averaging at the heart of every map-making algorithm. 
Unfortunately, simple averaging is highly sub-optimal for this task 
because the noise in the TOD is correlated due to 1// noise. One 
could attempt to high-pass filter the resulting low frequency 1// 
noise drifts, thereby leaving random uncorrelated or white noise 
in the TOD. Such a method is not lossless, as it filters the signal 
as well. Whilst the effects of filtering can be mitigated in multi- 
pole space, by simulating its effects using Monte-Carlo simulations 
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JHivon et al.l2002r) . the variance of the recovered power spectra are 
larger. 

A number of previous experiments, including COBE, MAX- 
IMA and BOOMERANG, have opted to use the optimal max- 
imum likelihood map-making algorit hm (eg: ISmoot et alj 1 19921 
iHananv et al] I2000I . iMasi et al. I l200d). This algorit h m has been 



vyell d is cussed in the li t erature ( eg: TegmarkI 1 1997af). Natoli et al.l 
ll200lb . IStompor et all i l2002h . Ide Gasperis et alJ feOOSh ) and Is 
renowned for being slow and memory intensive. In addition, the 
algorithm requires accurate knowledge of the noise statistics of 
the TOD. Use of this algorithm is impossible for the next gen- 
eration of high resolution, kilo-pixel, polarisation experiments, 
where we will require Monte-Carlo noise maps to reconstruct the 
noise covariance required for estimating B-modes on the cut sky 
Jsini th & Zaldarriaga 2007). 

Destriping is a pro mising altern a tive to the maximum l ikeli- 
hood approach ( .Keihanen et al.l ( l2004l) . lKeihanen et alj ( l2005h ). By 
modelling the correlated noise in the TOD as a series of offset func- 
tions mapped onto the time-series, a nearly optimal map can be re- 
turned very quickly by solving for the offset function amplitudes. 
This system is much smaller than the full maximum likelihood sys- 
tem and can therefore be solved at a fraction of the computational 
cost. Destriping has been well discussed in the literature for the 
Planck experiment and favourably compared, in simulations, to the 
optimal solution: destriping can produce near-optimal maps in or- 
ders o f magnitude less time jPoutanen et alj|2006l . lAshdown et al.l 
l2007al . lAshdown et al.ll2007br) . A variant of destriping has been 
successfu lly used in the Archeop s balloon experiment analysis 
pipeline ( Ma cias-Perez et al.l 120071) . However, these experiments 
are very different to upcoming ground-based B-mode experiments 
in terms of scanning strategy, presence of atmospheric noise and in 
the number of detectors. 

Planck and Archeops use a circular scanning strategy: scan- 
ning the CMB in overlapping great circles before re-pointing to 
scan a new great circle. Scanning strategies are very different for 
ground based experiments, as a matter of necessity, because the 
scanning must be in azimuth only (constant elevation) in order to 
avoid extremely high atmospheric noise contamination from chang- 
ing elevation. This causes a large difference in the level of cross- 
linking, which has a strong effect on the performance of map- 
making. 

The biggest difference between ground based B-mode ex- 
periments and Planck or Archeops is the presence of atmosphere 
(Archeops flew so high it was essentially above the atmosphere) . 
Atmospheric 1// noise dominates the correlated noise in the TOD, 
even for the current generation of ground based experiments. It 
has a higher knee frequency and higher spectral index than the 
detector noise. For example, raw (undif ferenced) TOD from the 
QUaD experiment JHinderks et al.ll2008h appears to exhibit 1/ f"^ 
atmospheric noise rather than simple 1//. Furthermore, the long 
term noise drifts will be very correlated between detector pixels 
JBussmann. Holzapfel & Kudl2005l) . 

In this paper, we concentrate on experimental designs that 
modulate the faint polarisation signal to frequencies higher than 
the detector 1// knee frequency fknee, using a rotating half- 
wave plate, so that Q and U will be sampled in the white 
noise regi me. Upcoming e xperimen ts with this desig n include 
C/OV ER JNorth et al.ll2008h EBEX iOxlev et aljboosl) and SPI- 
DER JMacTavish et al. 1120070 . These modulation experiments aim 
to mitigate possible detector 1// using hardware. However, so- 
phisticated map-makers will still be required to produce optimal 
Q and U maps. If we are to measure B-modes, we must treat the 



systematics in the data properly. One of these systematics will be 
instrumental polarisa tion, which causes T -^ P leakage in the data 
dJohnson et alj2007h . If the leakage is significant, it will be vital to 
remove the leaked temperature signal, a process that will require a 
high-resolution, near-optimal map of the temperature anisotropies 
as observed by the instrument. Such leakage may also mix atmo- 
spheric 1// noise into the Q and U data, for which a sophisticated 
map-maker like destriping will be needed to remove leaked 1// in 
demodulated Q and U time-streams. The atmospheric noise itself 
may also contain a polarised com ponent that would likewise ha ve 
to be removed in the map-making JHananv & Roseiikranzll2003l) . 

This paper is the first in a series evaluating destriping as ap- 
plied to ground-based, partial-sky, experiments. We examine the 
speed and accuracy of destriping for modulated experiments using 
simulations, dominated either by detector or atmospheric correlated 
noise, of a single detector pixel. This is akin to assuming that the 
noise is uncorrected between detectors. We note that this is a poor 
assumption for the atmospheric 1// component: we will generalise 
from this assumption in the next paper in the series (see also Ap- 
pendix |BJ. 

In Section[2l we put our discussion on firm footing by deriving 
the optimal and destriping solutions to the map-making problems 
and discussing details of the implementation of the algorithms. In 
Section [S] we describe the simulation of the TOD and the scan- 
ning strategies. In SectionlH we first apply the algorithms to a pre- 
liminary signal-only simulation to test for bias and then proceed 
to apply the methods to the full Monte-Carlo simulations of TOD 
containing both signal and noise, comparing the performance of the 
algorithms directly with emphasis on speed and accuracy. 



2 THE ALGORITHMS 

Time ordered data, denoted by a time vector yt, is formed from the 
sum of two time-space vectors, the sky signal St and total noise 
per observation rit . The signal vector is created by the scanning of 
a telescope across the sky, which we consider to be innately pix- 
elised. With this assumption, the signal stream becomes the result 
of the action of a (Num^ x Npi^ei) projection operator, Ptp, on the 
sky map Xp. 

This can be described in tensor notation by 



yt 



-t'^tv^v 



-nt. 



(1) 



The reverse operation of Ptp is the binning operator P 



Ppt (where the superscript T denotes the transpose), which sums 
the TOD into a map. The two operators acting together, Ppt Ptp, 
sum the number of observations per map pixel. 

In the case of no noise, the map could be exactly returned by 
averaging: biiming the TOD into a map and dividing each pixel by 
how many times it was observed 



Xp — yPpfPfp) Pptyt, 



(2) 



which also returns the minimum- variance map in the case that the 
noise nt is white. This defines the simplest "naive" map-making 
algorithm. 

With polarisation measurements, we can measure 3 CMB 
skies: the unpolarised (or temperature) sky T and two skies cor- 
responding to the Q and U polarisation parameters. In this case, 
the sky map vector becomes 
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T 



U 



(3) 



a vector of length (3 x Npi^^i), which has covariance matrix 
C = (xTS) = 



(TT)(TQ)(TU) 
(QT)(QQ)(QU) 
(UT)(UQ)(UU) 



(4) 




with dimensions (SNpixei x SNpixei)- 

The projection operator P describes the modulation of the Q 
and U signals into the TOD, expanding into T,Q and U polarisation 
pointing matrices 



(5) 



The half-wave plate experimental design modulates the Q and 
U signals into the time stream at a frequency uj — 4/, where the 
modulation frequency / is the frequency of half-wave plate rota- 
tion. The modulation frequency is chosen to be larger than the knee 
frequency of the correlated 1// noise, such that the low intensity Q 
and U signals are sampled in the white noise regime, at a frequency 

1^ > 4: f knee- 

The signal part of the modulated time stream is then 
St = ^Ptp (Tp + Qp cos 4f3t + Up sin 4/3t) . (6) 

where /3 is the orientation angle of the half-wave plate with re- 
spect to the sky and Ptp is the simple pointing matrix from (TJ 
djohnson et alj2007h . 

This gives us the polarisation pointing matrices from ^ 



Ft 

Pu 
2.1 



= -Pt 



tp 



, /cos4^\ 



tp 



(7) 



The Maximum-Likelihood Solution to the Map-Making 
Problem 



To estimate the sky map Xp, we begin with Bayes' theorem 

p{e\Di) oc p{D\ei) X P{e\i), (8) 

where 6 is our list of parameters, D is the data and / is all of the 
assumed background information of the experiment. In the map- 
making problem, the data is the TOD vector yt and the parameters 
are the sky map Xp and the noise Ci^^tt' = (^(i)^(i'))- Bayes 
theorem now becomes 

PixpCN,u'\ytI) ex P{yt\xpCN,u'I) X Pixt\I)PiCM,u'\I)-(9) 

We assume that the noise is a stationary Gaussian realisation 
of N{f), an underlying noise power spectrum, such that (rit) = 0, 
Cjv^^, = Cjv(t ~ t'). Furthermore, we assume complete a priori 
knowledge of the noise power spectrum N{f). This is like assum- 
ing a delta function prior on the noise, which effectively becomes 
part of the background information. With a flat prior on Xp, the 
posterior is simply proportional to the likelihood 



P{x\yl) oc exp(^- ^j, 



with 

where we have switched from tensor to matrix notation. 
Solving for x by minimising x^, we find 



'=(P' 



'Cn'P) 



^P^C 



N y- 



(11) 



(12) 



which has pixel noise covariance matrix (obtained from the second 
derivative d^x^ /^x^) 



c = (P^Cj;,^p) 



(13) 



The matrix (P^Cn ^P), sometimes called the weight ma- 
trix, is large and sparse. Traditional construction and inversion is 
prohibitive, with the memory requirement of storage scaling as 
0{Npi^^i) and explicit inversion methods scaling as 0{Npi^^i). 
Fortunately, there are a number of computational tricks at our dis- 
posal. 

The first trick is used to make the computation of C^""^ feasi- 
ble. Cn is an {Nt x Nt) matrix whose explicit construction and 
inversion is impossible. However, if we assume that the 1// noise 
in the TOD is stationary, the ensemble average matrix Cn can 
be well approximated as circidant (where each row is exactly the 
same as the row above, all shifted one column to the right). Sys- 
tems involving circulant matrices can be inverted by deconvolution. 
If Ax — h, where A is circulant, then x — T^^[J-^]/J-[Ain\\, 
where T and T~^ denote Fourier and inverse Fourier transforma- 
tion respectively and Ai„ is a vector formed of the first row of 
matrix A. 

This assumption is not valid for real data, as the TOD would 
have to wrap around to correlate the ends of the time-stream. For 
noise simulated by Fourier transforms however, the circulant ap- 
proximation is exact. We note that for real data, the noise covari- 
ance is in fact a symmetric Toeplitz matrix, whose inverse can 
be calculated ex pensively m 0(N t) operations using Levinson's 
method (see eg: IPress et alj|2002h . The standard approach is to 
use the "MADCAP approximation" when inverting this matrix: 
the matrix is inverted as if it were circulant and then elements far 
from the diagonal are set to zero to enforce a Toe plitz structure, 
Cj^^(t - i') = for \t - t'\ > Nt/2 l lBorrillll 19991) . This approxi- 
mate inverse covariance can then be convolved with the TOD using 
FFTs. 

For the noise covariance of our simulations, we have that 



^[Cni,]=P(/), 



(14) 



(10) 



where P{f) is the ensemble average noise power spectrum and 
Cni* is the first row of Cn. 

The second trick speeds up the inversion of the weight matrix 
itself. Rather than inverting the matrix explicitly, we solve the sys- 
tem in l ll2t iteratively, using an algorithm from the conjugate gra- 
dients family. A number of maximum likelihood implementations 
have used a preconditioned conjugate gradients (peg) al gorithm for 
this ta sk (eg: ROMA (Natofi et al. 2001), MapCUMBA JDore et al.l 
I2OOII) and SANEPIC tPatanchon et al..2007 )). However, we imple- 
ment a more robust algorithm from the conjugate gradients family: 
MINRES, an algorithm capable of solving any invertible symmet- 
ric matrix, regardless of positive definiteness (Barret et al. 200g). 
By definition, the matrix is symmetric and should be invertible if 
there is a sufficient dispersion of polarisation measurement angles 
for each pixel. The weight matrix should, analytically speaking, be 
positive definite. We implement MINRES to protect against numer- 
ical departures from positive de finiteness that can propagate from 
inaccuracies in noise estimation ( INatoli et al.ll200ir) . 
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The speed of iterative inversion depends on the condition 
number of the matrix (equivalent to the absolute ratio of the largest 
and smallest eigenvalues of the matrix). A poorly conditioned sys- 
tem is not well posed for iterative inversion, so we insert a pre- 
conditioning step into the algorithm. The system Ax — b can be 
preconditioned using preconditioner matrix K as follows 

K^^ AK^\K2x) ^ K^H (15) 

where the left and right hand preconditioners K\ and K2 are de- 
fined by K1K2 = K. The preconditioner should be chosen such 
that K~^ ~ A~^, but it should also be quick to compute. In prac- 
tice, choosing iC" is a trade off between improvements in conver- 
gence and added computation time. The size of the system prohibits 
the use of a sophisticated preconditioner, such as an incomplete 
Cholesky factorisation, so we use the simple but effective point Ja- 
cobi preconditoner. This is simply the diagonal of the weight ma- 
trix, which we approximate as 



K„ 



E E 

t t'=t-Xr 



[PptC j^ .^, Pt' 



,)5, 



pp'j 



(16) 



for a correlation length Ac (Sppi is the Kronecker delta). 

Iterative inversion has a further advantage. The explicit calcu- 
lation of the weight matrix is not required, we only need to repro- 
duce the action of its matrix multiplication on the sky map x. This 
is easily done with the following algorithm (in which the weight 
matrix is factorised into its component operations) 



(p^Cn^p; 



X — J^^tri'^ 



T 



[PtpS] 



T\C 



Nit 



(17) 



a process of projection, deconvolution (using fast Fourier trans- 
forms) and binning. This final trick has decreased the memory re- 
quirement for the matrix from 0{Np^^^i) to 0(Nt). 

It should be noted that the computation time for the optimal 
solution is still unfeasible for future data sets, despite the tricks 
implemented into the algorithm. The operational scaling of the al- 
gorithm is 0{Nit X Ntil + log2 lA'tl)), where Nu is the number 
of iterations. The binning and projection operators (Ptp and Ptp) 
scale as 0{Nt) so that each iteration is dominated by the FFT and 
inverse FFT (both of which scale as 0(Nt logj \Nt\)). 



2.2 An alternative: the destriping method 

There is an alternative, approximate, method for modelling the 
noise in the time series. The noise vector n can be modelled as 
uncorrelated white noise plus a series of discrete offsets that rep- 
resent correlated noise. The amplitudes of the offsets are estimated 
and subtracted, as illustrated in Figure [T] whilst leaving the sig- 
nal untouched. This approach, called destrip ing, has recently been 
derived in a rnaxim um-likelihood context dKeihanen et al.l l2004l . 
iKeihanen et al.l2005r) . whose notation we follow. The latter of these 
papers describes the MADAM algorithm, whose solution we use 
here. 

We write the noise vector rit as a sum of vectors of uncorre- 
lated (white) noise, nw, and correlated noise, Ucorr, 



Tit — Till; -\- Ticorr- 



(18) 



The essence of the destriping method is to approximate ricorr 
as an expansion of a set of offset functions, Fq, with amplitudes a 



corr — / -T ti^ 



(19) 
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Figure 1. Upper panel Plot of a section of raw TOD with long term 1// 
noise drifts and the maximum likelihood offset functions (light grey) from 
Descart projected onto it. Lower panel Same as above except the TOD is 
signal only, so the offset amplitudes are all zero. 



Fti 



The simplest choice of offset function is a constant: 

1 t£ Ai 
otherwise. 



(20) 



where A; is a chunk of the TOD. In this scheme, FiUi is a discrete 
jump in TOD space from outside chunk i to constant a; in side 
i. Thus, the sum in l |19t produces a time domain vector of constant 
offsets with amplitudes a approximating the correlated noise. More 
complicated offset functions can be chosen, for example Fourier 
series or Legendre polynomials. 

The amplitudes a will be Gaussian random numbers satisfying 

K) = (21) 

{aaoi) = Caaa'- (22) 

With this approximation for correlated noise, equation ([l} for 
the TOD can be re-written as 



y = Px + Fa + nw ■ 



(23) 



Again we minimise the x , but this time we require solutions 
for both the map x and the amplitude vector a. With a as a second 
parameter, the likelihood of the TOD becomes 

P{y) = Piy\^, a)P{a\x)P{x). (24) 

However, the probability distribution of the amplitudes is in- 
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dependent of the CMB, which we consider to be deterministic 
and have no associated probability distribution. Thus, the amph- 
tude probabihty distribution becomes an independent prior on a, 
P(a\x) — P{a), and the CMB prior is a constant. The likehhood 
is now given by 



P{y)=P{y\x,a)P{a). 



(25) 



The hkelihood in i25t is simply the white noise distribution, 
which is a Gaussian with covariance N — a^Sa 



P{y\x,a) = (27r|N| 



\-l/2 



cxp 



-n,„JN Un 



(26) 



The next step is to decide what prior information on the ampli- 
tudes to include through P{a). If we have an estimate of the 1// 
noise power spectrum, we can include prior information through 
the noise covariance matrix Cn- The probability distribution for 
the offsets is Gaussian 



P{a) = (27r|Ca 



-1/2 



exp 



a C„ 

2 



(27) 



where Ca is a re-projection of Cn through Ca = 
(F''"F)~'^F^CnF, which for our simulations is circulant 
(but for real data is symmetric Toeplitz and would be inverted and 
convolved as described in Section IzH . Forming a x^ from these 
distributions we obtain a function to minimise for the amplitudes 
and the map: 



-21n|P(y)| 

(y - Fa - Px)'^N~'^{y- Fa - Pf ) 



-a tj„ a. 



(28) 



The x^ can be simplified by writing the map x in terms of the 
data through x = (P'^N"^P)"^P'^N"^(j7- Fa). This equa- 
tion can be recognised as simple naive binning, as the noise in the 
destriped time-stream y — Fa is white A*' = a^Sij. 

We can now gather terms involving P and P''^ into a single 
operator Z 



X 
where 



{y- Fa^Z'^'N^'^Ziy- Fa) + a^C'^i 



I - P(P'^N~^P)'^P'^N~^ 



(29) 



(30) 



and I is the identity matrix. 

The operator Z is a signal cleaning operator. The operation Zy 
removes the signal component from yby subtracting a naive map of 
y projected onto TOD space. Z also has the property Z^N~''"Z = 
N~^Z. 

We obtain an estimator for a by minimising ( |29t with respect 
to a 



(F'^N^^ZF + Ca^)a = F'^N'^Zy. 



(31) 



This is an inverse problem like that of the maximum likeli- 
hood algorithm. We can use many of the same tricks in solving it. 
We solve the system iteratively, using a preconditioned conjugate 
gradients (peg) algorithm. The system is typically smaller than that 



in the previous section because N^ 



< Npi^e.1 for a single day. 



Again, we do not explicitly construct the matrix, rather we do each 
operation on both sides of \i\\ individually. 

This system solves much more quickly than the maximum 
likelihood system. The operations on the left hand side of OU scale 



as 0(Nt). The exception to this is the inversion of the circulant 
offset covariance matrix Ca, which can be achieved through cyclic 
deconvolution scaling as Oijia logj \na\)- Each iteration of the de- 
striping algorithm scales as 0{Nt + Ua logj |na|), in comparison 
toC'(A''t(l+log2 lA'^tD), the iterative scaling of the maximum like- 
lihood algorithm. 

The system is preconditioned using preconditioner K such 
that K~^Ax = K~% where 



Tf = f'^n'^f + cr^ 



(32) 



is a circulant matrix and is inverted using cyclic deconvolution. 

With the amplitude vector found, one subtracts the correlated 
noise approximation Fa from the TOD y and naively bins the 
cleaned TOD to return the destriped map 



f=(P'^P)-^P^(y-Fa), 

The noise covariance of this map will be 
C = (P'^(N + FCaF'^)"^P)"\ 



(33) 



(34) 



where N — {riujn^) = a^Sij is the white noise covariance and 
Ca ~ {aa^) is the covariance of the amplitudes. 



3 SIMULATIONS 

We produced 4 sets of simulations, each with 200 signal-l-noise and 
200 further noise only realisations, from 12 hours of observing us- 
ing a single detector sampling at fsam ~ 100-Hz for 2 scanning 
strategies. The simulations are summarised in Table [T] The exper- 
imental design was that of a modulation experiment scanning at 
l°s~^ with a half wave plate rotating at frot = 5-Hz, correspond- 
ing to a polarisation modulation frequency /mod ~ 20-Hz. 

Our aim is to examine the capabilities of different algorithms 
on the same data. For this goal, the white noise variance is essen- 
tially irrelevant: what we care about is the level of correlation be- 
tween TOD and thence map pixels. To see B-modes will require 
years of observing with hundreds of detectors. For this work, in 
which we are simulating 12 hours of data from a single detec- 
tor, we have chosen a white noise level that allows us to convinc- 
ingly measure B-modes. For all our simulations, we use a heuristic 
NET= 0.242 fiK^. 

Two noise scenarios are simulated. The first of these repre- 
sents detector dominated 1//, as would be returne d for a space- 
bourne or balloon flight experi ment (like Maxi Pol JJohnson et al.l 
2007) or Archeops (Macias-Pe rez et al.l |2007|) ) and has a 1// 
power spectrum with spectral index a = 1.0 and knee frequency 
0.1-Hz, the projected noise correlation properties of the CjOVER 
detectors. 

The second scenario simulates 1// dominated by atmospheric 
fluctuations, assumed to be un-polarised, which is expected to 
be the case for ground based experiments. The atmospheric 1// 
noise is simulated with spectral index a = 1.9 and knee fre- 
quency fknee ~ 0.2-Hz . We Calculate these numbers from a 
rough fit to sample QUaD 100 GHz noise power spectra (see figure 
36 of Hinderks et al. 2008) and they are fiducial: the level of at- 
mospheric fluctuation depends upon wind and scanning velocities 
and evidence suggests it may vary between CMB observing sites 
JBussmann. Holzapfel & Kucll2005h . 

The TOD is simulated by taking a list of telescope pointings 
from a scanning strategy and pulling out the T, Q and U signals 
from simulated CMB maps at the right ascension and declination of 
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Table 1. Summary of the simulations 
Strategy tensor to scalar ratio knee frequency spectral index 
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Figure 2. Input theoretical power spectra: the solid curve is Cf"^ ; the 
dashed curve is C^^; and the dot-dashed curve is C^^ 



the pointing. The T, Q and U signals are combined into time streams 
using equation JSJ and added to an instrumental noise stream. 



3.1 CMB template 

Two theor etical power spectra were calculate d using the CAMB 
package u JLewis. Challinor & LasenbvluOOOh for a typical set of 
concordance parameters. The first spectrum was used to synthesise 
maps with B mode corresponding to an initial tensor to scalar ratio 
of r = 0.1 plus the expected B-modes from weak lensing. The T, 
E and B spectra from this model are shown in Figure |2] The sec- 
ond spectrum was calculated to produce zero B-mode simulations, 
using r = and ignoring weak lensing B-modes. 

The model power spectra were convolved with a fiducial sym- 
metric Gaussian instrumental beam of FWHM 10 arc-minutes. 
From the initial spectra, 200 Gaussi an realisations of the CMB 
were simulated using the HEALPix JGorski et alJIlOOa) package 
iyn/aJO using Uside = 512, giving resolution up to / = 1024. 



3.2 Noise Simulation 

The simulated noise stream is a sum of stationary Gaussian realisa- 
tions of uncorrelated white noise and correlated 1// noise, which 
together have power spectral density 



PU)- 



Jsan 



fk 



(35) 



^ http://camb.info 

^ http://healpix.jpl.nasa.gov/html/facilitiesnode 1 1 .htm 




Figure 3. Ra/dec plot of the sabre scan pointing. The black curve is the 
rising field scan and the grey curve is the setting field scan. 



where a^ is the white noise variance of a single observation, fsam 
is the sampling frequency of the detectors (where integration time 
t = 1/ fsam), ct is the spectral index and fhnee is the knee fre- 
quency of the spectrum. 

The noise we have simulated is both stationary and Gaussian, 
satisfying 



(n) = 
{nn ) — 





Cn. 



(36) 
(37) 



Due to the stationarity property, the covariance of the noise 
between TOD becomes a function of time separation. The noise 
covariance matrix Cn is circulant and symmetric, allowing us to 
quickly evaluate its inverse using a Fourier transform: 



c-^\t-t') = [4- 



P-\f)e 



-if{t-t') 



df, 



(38) 



where P ^ (/) is the inverse noise power spectrum. 



3.3 Scanning strategies 

We simulate TOD using two scanning strategies. The noise reali- 
sation in the TOD are the same for each scan, so this amounts to 
changing the signal in TOD. Scanning strategy has been shown to 
have an effect on the abi lity of map-makers to reduce noise cor- 
relation ( lTegmarklll997a) . Most important is the connectivity of 
the scan (or its degree of cross-linking). With greater cross-linking, 
map-makers should produce fewer stripes and smaller residuals. 
Whilst the isotropy of the scans is different, we have kept the same 
average integration time per pixel in each map. 
Our scanning strategies are the following: 

(i) Sabre scan: Simulated scanning strategy for ground based ex- 
periment sited in the Atacama desert. The scanning elevation is kept 
constant at 45° to minimise atmospheric noise. The telescope scans 
back and forth in azimuth with a sinusoidal velocity curve, whilst 
the field rises or sets through the scanning elevation. The field is 
observed twice per day as it rises and sets, resulting in a minimally 
cross-linked scan as shown in Figure [5] The experiment runs for 
~ 5 days to complete 12 hours of integration time. 

(ii) Fence scan: For half the scanning time, the telescope slews 
back and forth horizontally with sinusoidal velocity whilst the field 
moves vertically through the scan at a much slower constant veloc- 
ity. The scan is repeated for the second half of the scanning time 
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with the scanning direction and field drift directions swapped, pro- 
ducing a highly cross linked square field. In total, 12 hours integra- 
tion is completed. 

The sabre scan is representative of the typical level of cross- 
linking achievable from the ground when keeping the scanning el- 
evation constant. This is a vital constraint, as if the scanning eleva- 
tion is varied then noise sourced from changes in airmass swamps 
the faint CMB signal. Unfortunately, realistic scans are never ide- 
ally cross-linked. The fence scan represents a nearly ideally cross- 
linked scan that cannot be achieved from the ground, but it is in- 
cluded to probe the effects of cross-linking on the performance of 
destriping. 



3.4 Diagnostics 



We require from our diagnostics, comparable statistics indicating 
the fitness for purpose of the map-making algorithms. We use the 
following diagnostics: 

(i) RMS residual: Root-mean-square of the residual map be- 
tween the recovered maps and the input theoretical map used to 
generate the TOD. 

(ii) Residual angular power spectrum: spatial information about 
the comparative temperature residuals is gained by analysing the 
angular power spectrum of the residual maps. The residual map is 
convolved with map field's window function VKj, so a modification 
of the MASTER method JHivon et alj|2002h is used to return an 
unbiased binned estimate of the residual (noise) power spectrum. 

(iii) Pure pseudo- Ci E and B mode estimation: The p ure 
pseudo-Ci estimator JSmithll20ol ISmith & Zaldar riagal l2007h is 
used to return estimates of the E and B mode angular power spectra 
Cf"^ and Cf^ fro m the estimat ed Q and U maps. This estima- 
tor has been shown dSmithll2006h to return unbiased estimates of 
E and B without the E— >B mixi ng from the a mbiguous modes that 
arise from the field boundaries ( lBuni]||2002r) . The diagnostics are 
the mean estimates of the power spectra, the mean estimates of the 
E and B noise power spectra and the Monte-Carlo error bars for a 
single realisation. The details of the pure pseudo-C; estimator are 
presented in Appendix IaI 

(iv) Filter function: The filter function Fi from a preliminary 
signal only simulation is used to analyse the signal error compo- 
nent to the estimat ed map residuals. The filter transfer function Fi 
JHivon et al.l2002h gives the degree to which signal filtering by the 
map-making process affects the recovered power spectrum. If we 
don't want to lose any information on the signal then the imprint of 
filtering must be negligible {Fi — 1 = 0). 

The power spectru m of the estimated map Ci is related to that of 
the input map C} by JNatoh et al.l J200lh . |Poutanen et alj J2006l) ) 



{Ci) = Fi{C}) + {Ni) 



(39) 



where A^; is the power spectrum of the noise bias. If the estimated 
map is made from signal only TOD, (A;) = so _Fi can be calcu- 
lated by inverting ( |39b . 

Both Ci and C} are convolved with the same beam and have 
the same pixelisation and sky mask. They have identical transfer 
matrices A";;' (see Appendix lAt. so we need not correct for mode 
coupling. 



4 RESULTS 

The simulations were analysed by three map-making algorithms: 
standard naive map-making (equation [JJ, optimal maximum like- 
lihood map-making, and destriping. Destriping was repeated with 
various offset function baseline lengths (hereafter denoted by Ac 
in units of time), ranging from a baseline length Ac = 1 second 
through to Ac = 1000 seconds (~ 16 minutes). The destriping 
code, Descart, operated i n two modes: tradition al destriping mode 
(as in all papers prior to iKeihanen et al.l ( l2005r) ). and in a covari- 
ant destriping mode, makin g use of noise in formation through the 
offset covariance matrix Ca JKeihanen et al . 2005). The two modes 
are compared in Section l431 Elsewhere, all destriping is the covari- 
ant form. 

Both the covariant destriping and maximum likelihood map- 
makers require prior noise information (through Ca and C^^ re- 
spectively). In reality, this noise information would have to be es- 
timated from the TOD directly, highlighting the importance of the 
noise estimation step immediately prior to map-making. In this pa- 
per, we are not addressing the issues of noise estimation and so 
use ensemble average noise information through the power spec- 
trum P{f) used to generate the TOD noise. A number of ap- 
pro aches to noise estimation have been discussed in t he literature 
(eg: iFerreira & Jaff3l200d lAmblard & Hamiltonll 20041) . 



4.1 Signal only maps 

In addition to signal-l-noise simulations, an initial pure signal simu- 
lation was analysed first for the sabre strategy, using the same prior 
noise information as for the signal-l-noise simulations. In this limit, 
any departures in the estimated map from the input map are dis- 
tortions of the signal caused by the map-making process itself. The 
filter functions for this simulation were calculated using J39b where 
Ci and Ci are raw pseudo-C ; s calculated by the HEALPix package 
anafas^ JGorski et a"i]|2005h . 

Figure |4] shows the T, E and B filter functions for the 
maximum-likelihood (upper panel) and destriping (middle panel) 
algorithms. Destriping, covariant or otherwise, does not filter the 
signal at all, despite use of a noise prior through Ca (Figure|4] mid- 
dle panel). The data term in the likelihood ( I29t forces the offset 
amplitude vector to the null vector, as the signal cleaning operator 
Z perfectly removes the signal from the fit. 

The maximum likelihood algorithm does display some mini- 
mal signal distortion (Figure|4l upper panel). The magnitude of the 
distortion is effectively negligible, amounting to less than 0.001% 
of the signal at the worst multipoles {I < 50 for the T map). The 
filtering satisfies a filtering « (^noise at all multipoles and can be 
ignored. 

This filtering can be explained through the presence of degen- 
erate pixels. A minimum of three observations at different mod- 
ulation angles 13 are required to reconstruct T, Q and U for each 
pixel. Any pixel for which this condition is not met is degenerate 
and must be ignored, else the problem becomes singular. 

For Descart, the reconstruction inversion is conducted for each 
pixel separately - if any pixel's 3x3 pointing matrix is singular, 
then the pixel is irrecoverable. For the maximum-likelihood algo- 
rithm, the reconstruction is accomplished by the matrix inverted by 
the MINRES conjugate gradient inverter. If degenerate pixels are 
included, the MINRES iterations do not converge: the improvement 



http://healpix.jpl.nasa.gov/html/facilitie.snode5.htm 
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Table 2. Comparison of the performance of iterative inversion in the 
maximum-IiJcelihood algorithm for different pixel exclusion conditions in 
the signal only simulation. N,„in is the minimum number of hits required 
to accept a pixel whilst convergence is the rms residual upon exit of the 
MINRES algorithm. The Nmin = 2 case includes degenerate pixels, so the 
matrix is singular and the iterations fail before the convergence critereon is 
reached. 
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Figure 4. Upper panel Filter function Fi — 1, for maximum likelihood 
algorithm showing minimal signal degradation from the initial signal only 
simulation. Middle panel Fi — 1 for destriping, displaying zero signal degra- 
dation. Lower panel Example filter function for naive map of TOD filtered 



with C(\ 



-1/2 



filter, where the signals are distorted and T is heavily filtered 



in residuals per iteration tends to zero before a reasonable conver- 
gence critereon is reached (such as 10^'', Patanchon et al. 2007). If 
near-degenerate pixels remain, the condition number of the matrix 
increases, requiring more iterations to invert the matrix (see Table 

Such pixels will be common for ground based polarisation ex- 
periements, where the TOD is naturally split into day chunks and 
maps are made daily. It is well established that these pixels must 
be removed in such a way as to maintain time stream continuity 
JStompor et alj|2002h . Time ordered data sourced from a degener- 
ate pixel is referenced to a junk pixel with zero signal outside of 
the estimated map that is ignored in the minimisation. Degenerate 
TOD are kept for the noise deconvolution step C^y, where the 
TOD value is replaced by a constrained realisation of noise. 

However, some errors from this method propagate through the 
deconvolution step of l lI7b . which is applied approximately through 
FFTs, into neighbouring pixels along the scan direction, causing 
signal filtering. 

This effect does not appear in the Descart destriped maps, 
as the degenerate TOD are removed from the binning/projection 
process of F^ and F respectively. There are no resultant gaps in 
the offset amplitude vector a and so the evaluation of C^ """a using 
FFTs suffers no degradation due to discontinuities. 

For comparison, the lower panel Figure|4]shows the filter func- 

— 1/2 

tion for high-pass filtered TOD using the Cj^ filter. In Fourier 
space, this filter is the square root of the noise filter in OSt . It has 
the property that filtered TOD with 1// noise have the same diag- 
onal noise covariance matrix as TOD with white noise only, pre- 
whitening the TOD. The effect of the filter on the T signal is devas- 
tating. The effect on Q and U is mitigated here due to the modula- 
tion. Whilst the mean filter functions can be evaluated from Monte- 
Carlo simulations and then deconvolved from the power spectrum 
(Hivon et al. 2002), the wiggles in the functions are signal realisa- 
tion dependent and will add to the variance of the recovered power 
spectrum. We suspect the high-£ bias in the B-mode filter function 
is due E-^B mixing caused by distortion of the Q and U signals by 
the filtering. This will be studied in the next paper in the series. 

The filter ing effects here are separate from the signal fi lter- 
ing reported inlPoutanen et al.l ( l2006h . lAshdown et alj ( 12007 ah and 
lAshdown et al.l ( l2007bf) . Our input CMB maps are at the same res- 
olution (HEALPix riside = 512) as the estimated maps, so the fil- 
tering error from applying FFTs to TOD including sub-pixel signal 
gradients is absent. We have considered the sky to be innately pix- 
elised at the experimental resolution, which is one of the underlying 



© 2008 RAS, MNRAS 000, 000-000 



Map-making in small field modulated CMB polarisation experiments 9 



.10 



.08 - 



.06 - 



.04 



.02 



.00 



panels) are very similar. Their difference maps show a small change 
in the magnitude of the noise but no striping structure as for T. The 
structure in this difference map can be understood by noting that the 
1// noise that seeps through to Q and U maps is itself modulated 
by sinusoidal functions during the de-modulation process. 

The maximum likelihood solution produces maps with the 
smallest possible residuals. The relevant statistic to measure the 
ability of the destriper to return maps cleaned of correlated 1// 
noise is the ratio of the destriped map's root-mean-squared (rms) 
residual to that of the maximum likelihood map's rms residual, Ai: 



A. 



I^MLE 



(41) 



Figure 6. Variation of the mean dimensionless Ay = (cr/(Tji/j^£;)j' for 
200 detector fknee = O-l-Hz simulations with destriping function length 
\c in seconds. The diamond at Xq = 0.01-s is the maximum likelihood 
solution and the one at Ac = 43200-s is the naive solution of equation (2). 



assumptions in map-making formalism. If the sky were pixelised as 
assumed, destriping would be lossless to machine precision. 

Including sub-pixel gradients, it is expected that the maximum 
likelihood algorithm would filter the signal more than reported 
here: this is an avenue of on-going research. 



4.2 Reconstructed Map Residuals 

The residual map e between a reconstructed map and the input sim- 
ulated map can defined for each of the Stokes parameters as 



+ e„ 



(40) 



where e^ and £p represent signal error and pixel noise respectively, 
a;°"* is the recovered map estimate and a;^" is the input map used 
to simulate the TOD. The signal error for the algorithms has been 
shown to be negligible, so ep ~* ep . 

4.2. 1 Signal and detector noise simulations 

We first look at the detector noise simulations with /fence = 0.1. 
Differences between the maps from different algorithms are not 
perceptible by eye, as the map is dominated by the signal. However, 
differences are perceptible in the residual maps. The top row of Fig- 
ure[5]contains residual T and Q maps for Descart (with Ac = 1-s), 
whilst the middle row show residuals from the same data mapped 
by the naive algorithm (in which the TOD is mapped using equa- 
tion l[2j with no pre-filtering). The characteristic 1// noise stripes 
are visible in the naive T residuals and are notably absent in the 
Descart T residuals. The reduction in the correlated noise is best 
illustrated by the difference map between the destriped and naive 
maps (bottom left panel of Figure [5]l. The stripes in this map are 
the correlated 1// noise in the pixel map that have been removed 
by the destriping process, visibly following the scanning pattern. 

The Q and U residuals are very close white noise only due 
to the modulation of the signals out of the low fknee detector 1// 
so the destriped and naively mapped residuals (top and middle right 



where i is one of the Stokes parameters T, Q or U. This dimension- 
less statistic is independent of the white noise level in the map, it 
is the constant by which the noise in the map is multiplied above 
optimality. 

Figure |6] shows the variation in mean (over 200 realisations) 
At with chunk length Ac in the fknee = 0.1-Hz simulations. 
The unconnected diamonds are mean At for the maximum like- 
lihood (corresponding to Ac = 0.01-s) and naive (corresponding 
to Ac = tint, the total integration time of the experiment) maps. As 
Ac is decreased. At decays towards optimality at At = 1. The 
best destriped maps are returned at the smallest chunks size con- 
sidered (Ac = 1-s) and have only 0.4% higher pixel residuals than 
the optimal map, achieving 96% of the reduction in pixel residu- 
als that the maximum likelihood method brings over naive binning. 
These numbers are relative to optimality and are dependent only on 
the correlated noise. They are independent of the magnitude of the 
white noise floor. 

The destriping method models the noise as a white component 
+ a correlated component described by a series of offset functions. 
The noise covariance Cn is modelled as 



Cn 



' (^w^t. 



+ FCaF' 



(42) 



where Ca is solely responsible for the off-diagonal part of Cn. 
When the resolution of Ca is increased, the approximation to the 
real Cn is more accurate and so the residuals are smaller The resid- 
ual reduction flattens as Ac is decreased significantly below the 
noise correlation length (10 seconds for fknee ~ 0.1-Hz). 

Mean AQ for these simulations is shown in Figure|7] Despite 
the modulation of the Q and U signals to fmod ~ 20-Hz (200 
times higher than the noise fknee), there remains a gradient in the 
relative noise residuals between the optimal map and the naive map. 
Destriping converges to near optimal noise levels at Ac = 1-s, as 
for the T maps. The difference in pixel error between the algorithms 
is at most of order 0.1%. 

The extra noise power in the naive T maps is at all angu- 
lar scales, as shown in the residual angular power spectra in the 
upper panel of Figure [S] The curves in this plot are the mean T 
residual angular power spectra for all 200 realisations. The resid- 
ual power spectrum for the optimal maximum-likelihood algorithm 
is the solid curve. The long dashed curve is from the Descart 1-s 
residuals, the dotted curve from the Descart 400-s residuals and the 
dot-dashed curve is from the naive residuals. The 400-s baseline 
Descart spectrum is included as an example of "quick and dirty" 
destriping as opposed to using destriping to replace the maximum- 
likelihood approach. This coarser noise model makes some im- 
provement to noise at large scales but fails to reduce noise power 
at scales beneath its Ca resolution. The best 1-s baseline Descart 
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Figure 5. Top row. Residuals between a Descart 1— s output map and the input map, T on the left and Q on the right. Middle row: Residuals between a naive 
output map and the input map, T on the left and Q on the right. Bottom row. Difference between the top an middle row: Naive residuals - Descart residuals. 
Again, T is on the left and Q is on the right. 
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Figure 7. Variation of mean dimensionless Ag with destriping baseline 
length Ac in seconds, for the detector simulations. 
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maps have residuals nearly indistinguishable from optimality at all 
angular scales. 



4.2.2 Signal and atmospheric noise simulations 

The simulated atmospheric noise has more correlated noise, with 
1// fknee ~ 0.2-Hz and a much larger spectral index of a = 1.9. 
Correspondingly, significantly more correlated noise leaks through 
to the T map from naive map-making. In this case, sophisticated 
mappers are essential, reducing the rms T residual by a factor of 
10 and the Q residuals by a factor of 2 (Figure |9}. The majority 
of the improvement is achievable using a baseline of 10-s, achiev- 
ing 98.7% of the possible improvement in T residual power and 
99.0% of the possible improvement in Q residual power. The best 
1-s baseline maps achieve 99.9% of the possible improvement for 
T and 99.9% of the possible improvement for Q. 

Despite the hardware effort of QU modulation using a half- 
wave plate, this noise regime produces reducible Q and U pixel 
noise sourced from correlated TOD noise that can be more than 
halved by the application of even relatively long baseline destrip- 
ing. This assumes no T -^ P mixing from instrumental polarisa- 
tion, the results of which would alias stripes into the Q and U maps 
themselves, strengthening the requirement for destriping. 

The lower panel of Figure [8] shows the mean angular power 
spectra of the T residuals for the algorithms over 200 realisations 
(the curves are the same as those in the upper panel of Figure [Sj. 
It should be noted that the optimal map shows slightly more noise 
power for these simulations than it did in the detector simulations, 
because even the optimal maximum-likelihood solution cannot re- 
move all of the 1// noise. 

The "quick and dirty" 400-s baseline approach is more effec- 
tive at smaller angular scales than it was for the detector noise. The 
best 1-s baseline Descart maps are again nearly indistinguishable 
from optimal maps at all angular scales. 



4.2.3 Fence scan simulations 

The fence scan simulations used the same noise parameters as the 
signal and detector noise simulations with fk,iee = 0.1-Hz and 
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Figure 8. Upper panel: Mean T residual angular power spectra for the de- 
tector fknee = 0.1-Hz simulations. The solid curve is for the MLE resid- 
uals; the dashed curve is for the 1 second destriping residuals; the dotted 
curve is for the 400 seconds destriping residuals; the dot-dashed curve is 
for the naive binning (equation |2) residuals. Lower panel: Same as above, 
but for the mean T residual angular power spectra of the signal-l-atmospheric 
noise simulations. 



Q = 1.0. The scan was designed to cover approximately the same 
number of pixels using the same number of observations. The av- 
erage integration time per pixel is the same, however the isotropy 
and connectivity of the scans are different. Both the fence and sabre 
scans are sinusoidal scans, spending disproportionally more time 
integrating at the edges of the field, the fence scan integrating more 
at all four field edges and the sabre scan integrating more only 
on two edges. The connectivity, or level of cross-linking, is much 
greater in the fence scan, with the two scanning directions ideally 
cross-linked in being perpendicular to one another. 

The rms residuals in the optimal fence maps are 8.7% smaller 
than the optimal sabre maps. Further to this, Figure [Tol shows that 
the Descart T residuals decay toward optimality at longer baselines 
than for the sabre scan, reaching a trough at 10-s baselines: fur- 
ther reductions yield negligibly different residuals as the maps are 
already near-optimal. 

The improvements in Q residuals are similar to the improve- 
ments in the sabre scan simulations: the level of cross-linking has 
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Figure 9. Upper panel: Variation of mean dimensionless A^ for the signal 
+ atmospiieric noise simulations. Lower panel: Variation of mean dimen- 
sionless Aq for the signal + atmospheric noise simulations. 



Figure 10. Upper panel: Variation of dimensionless (Sy for the fence scan 
simulations. Lower panel: Variation of dimensionless &q for the fence scan 
simulations. 



little effect on the near-white noise in the modulated Q and U 
streams. 



4.3 E and B mode errors 

The polarisation modulation design is intended to remove the ef- 
fects of correlated noise from the Q and U maps and thence from 
the E and B power spectra. However, the increased variance in the 
Q and U maps from simple naive birming of a single detector prop- 
agates through to the polarisation power spectra. 

Figures [TT] and [l2] show the mean estimated E and B-mode 
power spectra from the 200 signal-l-detector noise simulations re- 
spectively, after subtraction of the mean noise power spectrum. The 
error bars are the Monte-Carlo error bars for a single signal-l-noise 
realisation, and the dashed curves are the input theoretical power 
spectra. The dotted curve indicates the magnitude of the plotted 
error bar in each bin, which include both experimental noise and 
sample variance from partial sky coverage. The mean estimates are 
unbiased with respect to the input model. 

The mean noise angular power spectra from the sig- 
nal-l-detector noise simulations are shown in Figure [73] The curves 
are the difference between the noise angular power spectra ei- 



ther from naive (solid curve) or destriping (dashed curve) and 
that of the maximum-likelihood noise power spectrum, defined by: 

iJSl{^<^VVro^ _ {N^)MLE^ ^^^^^ (^j^^^^avpro^ -^ jj^^ j^^^-^g spCCtrUm 

from either the 1-s Descart maps or the naive maps. If either is op- 
timal, the curve will be all zeros. 

Whilst neither curve is optimal, the change in noise power is 
orders of magnitude smaller than the signal power at all multipoles 
and has a negligible effect on the error bar magnitudes. 

Figure[T4lshows the noise angular power spectra difference for 
the signal-l-atmospheric noise simulations, where the changes are 
very significant. At multipoles / > 400, the B-mode noise angular 
power spectrum becomes larger than the input signal angular power 
spectrum. 

The effect of this on the B-mode error bars is clearly seen in 
the significance of the total detection. The total significance estima- 
tor C bins the bandpowers of the spectrum (Cb) into a single bin, 
weighted proportionally to the power of the input fiducial model 
Cj and inversely proportional to the variance of the bandpower 
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Figure 11. Plot of the mean estimated Cj^^ for the fknee = 0.1-Hz sim- 
ulations, using Descart on 200 noise and 200 signal + noise realisations 
with 1-s baselines. The eiTor bars are the average for each individual sig- 
nal+noise realisation. The dashed curve is the input power spectrum and the 
dotted curve is the magnitude of the plotted eiTor bars. 
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Figure 13. Upper panel: Differences between E-mode noise power spec- 
tra from the fknee = 0.1-Hz simulations in pK^. The solid curve is 



i^Hr. 



{Ni)mle whilst the dashed curve is {Nf')aestriped 



(Nf')MLE for Descart with 1-s baselines. Lower panel: Differences be- 
tween B-mode noise power spectra from the fknee = 0.1-Hz simulations 
in pK^. The solid curve is {N^)naive — (^P)mle whilst the dashed 
curve is {Np)ciestriped ~ {^P)mle for Descart with 1-s baselines. 



Figure 12. As Figure [TTI but for Cj^^estimates from Descart 1-s maps. 



C 



E. 






— 7^ 



(43) 



E, 



The significance of the detection is then given by 

(C) 

V((C'-(<5»2>' 

The significance of the E and B-mode detections for the algo- 
rithms with atmospheric noise are shown in Table[3] The increased 
noise from the naive maps slightly erodes the significance of the 
E-mode detection and destroys the detection of B-modes. We re- 
iterate that this apphes only in the case of a single detector and 
would not apply, for example, to detector differencing experiments. 



4.4 Spurious B modes 

200 signal-l-noise simulations were created using input CMB maps 
with artificially zero B-modes and typical detector 1// (fknee ~ 



Table 3. Atmospheric noise simulation (fkne 
significance 



: 0.2, a = 1.9), detection 



E-mode B-mode 



MLE 37.90 8.22 

Descart 1-s 37.90 8.22 

Naive 35.29 0.02 



0.1-Hz). Maps were made from these TOD streams using each al- 
gorithm and the B-modes of the polarisation fields were estimated 
as in the previous section. 

There was no evidence of the presence of spurious B-modes 
due to mode mixing from any of the algorithms. Figure [Tsj shows 
the mean estimated B mode for these TOD for Descart with 1-s 
baselines (the plots are identical for the naive mapper and the MLE 
algorithm). It is unbiased, correctly returning an ensemble average 
zero B mode estimate. 
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Figure 14. Upper panel: Same as the upper panel of Figure [13] but for the 
atmospheric noise simulations. Lower panel: Same as the lower panel of 
Figure[T3] but for the atmospheric noise simulations. 
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Figure 15. As Figure [T2l but for the simulation with zero C^^ from Descart 
1-s maps. 




Figure 16. Comparative runtime {tjtMLE) variation with destriping length 
\c- Upper panel: runtimes for the signal+detector noise simulations. 
Lower panel: runtimes for the signal+atmospheric noise simulations. 



4.5 Computing resources 

The principal concern for computing resources for map-making is 
CPU time, as it feasible to hold many single days' worth of TOD 
in memory at one time. The naive algorithm represents the quick- 
est, and dirtiest, possible method for reducing TOD into a map. Its 
computation time is dominated by the input/output overhead. 

The iterative scaling of the destriping algorithm is 0{Nt + 



Ua log2 



,compared to 0{Nt{'i + logj |A'^t|)), for the maxi- 



mum likelihood algorithm (where Nt is the number of number of 
observations). Effective values ofua always satisfy Ua « Nt, for 
example, the Ua used in this paper are between 2 and 5 orders of 
magnitude smaller than A^'t, so that the iterative scaling of Descart 
Nt + Ua logj \na\ ~ Nt- Noting that typical time stream lengths 
for day strategies are of order 10^, we can predict that destriping 
will require approximately an order of magnitude less CPU time to 
run. 

In practice, the time-stream can be split into chunks to speed 
up the Fourier de-convolutions, where the chunk size is some 
multiple of the correlation length of the time-stream noise. For 
a chunk size of Uc the iterative scaling of the FFTs reduces 
to 0{Ntod^o§,2 l"-c|) for the maximum-likelihood algorithm and 
0{Nt + ria logj \nca\) for destriping algorithm, where rica ~ 
Uc/Xc is still 2 — 5 orders of magnitude smaller than ric, main- 
taining the reduced complexity of the destriping algorithm. We also 
note that the use of hardware optimised FFT libraries, as opposed 
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Table 4. Typical numbers of Descail iterations for tlie signal+detector noise, 
signal+atmospheric noise and fence scan simulations. Also shown is the 
destriping baseline length Ac in seconds and the corresponding number of 
offset functions in the system (n^). 



Ac (s) 


ria 


detector noise 


atmospheric noise 


fence scan 


1 


43200 


40 


127 


18 


5 


8640 


36 


89 


15 


10 


4320 


32 


65 


15 


100 


432 


22 


25 


11 


400 


108 


16 


19 


9 


1200 


36 


10 


11 


7 



to the ubiquitous FFTW library, will further accelerate the FFT pro- 
cedures. 

Scaled computing times (t/tMLE, where Imle is the CPU 
time required for the maximum likelihood algorithm to com- 
plete) for the algorithms are shown in Figure [16] for both the sig- 
nal-l-detector noise simulations and the signal-l-atmospheric noise 
simulations. The codes are all serial and the runs were completed 
on the same machine, using dual-core Intel Xeon 3GHz CPUs, with 
the same allocated resources. The longest Descart baseline used 
was 1200-s, which returned a 49.1% reduction in Q and U residu- 
als for the atmospheric noise simulations (Figure |9l lower panel), 
out of a possible 55.04% for the optimal algorithm, whilst achiev- 
ing a 22 X speed improvement over the optimal algorithm. 

In addition to the iterative scaling, the computing time is also 
determined by the number of iterations required to solve the sys- 
tem. Table |4] shows the typical numbers of iterations required by 
Descart for the simulations compared to the destriping baseline 
length used. The fence scan simulations required fewer iterations 
than the sabre scan simulations with the same noise parameters (de- 
tector noise column). The greater cross-linking of the fence scan 
has produced a system that is easier to solve than the sabre scan 
especially at short baseline lengths where the system (na) becomes 
large. 

The number of iterations is similar for both sets of sabre scan 
simulations (detector noise and atmospheric noise columns) at very 
long baselines, but for the atmospheric noise increases much more 
quickly with baseline lengths < 100-s, requiring ~ 3x as many 
iterations than the detector noise simulations at the near optimal 
I-s baseline. 

By modelling the correlated noise as a series of offset func- 
tions, the destriping algorithm is solving a smaller system than the 
full maximum likelihood algorithm (generally Ua < rip except for 
the very shortest baselines, where Ua and rip are number of offsets 
and pixels respectively). Typical experimental resolutions in future 
experiments will be higher than that used here (riside ~ 512), for 
example Uside ~ 2048 for C;OVER, increasing Up by a factor of 
16 and leaving Ua untouched. 



4.6 Importance of including Ca 

Prior to the deve lopment of the MADAM algorithm 
JKeihanen et alj 1 20050 . destriping was conducted without prior 
noise information. This "traditiona l" des tr iping used only th e 
data term in (|28]| (Burigana et al. lll997h . iDelabrouillj jl998l) . 
iMaino et al. I ( ll999l)lKeihanen et al.l 12004)), negelecting the prior 
term containing Ca- Without this noise information, the potential 
of destriping to remove correlated noise is considerably reduced. It 
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Figure 17. Comparison between destriping with (solid line with circles) and 
without (dashed line with x) Ca information. 



is not possible to use traditional destriping at Acs where there is 
significant correlations between the offset function amplitudes a^. 

Figure [it] shows the effects of using traditional destriping in 
this regime. The dashed line shows the returned destriping At vs 
Ac- At large Ac, where the offsets are uncorrelated, traditional de- 
striping performs similarly to the covariant destriping using Ca (the 
solid curve with circles). When Ac is reduced into the regime of 
correlated offsets, the residuals behave pathologically, exploding 
to considerably higher magnitude than for the naive maps. 

This can be explained in two ways. The noise model in the data 
term of ( I28l l assumes no correlations between the offsets. When 
correlations are present, the noise model breaks down and the re- 
sults are junk. Including Ca introduces prior information on noise 
correlations into the model. This can also be recognised as linear 
regularisation, where the introduction of the regularising matrix Ca 
forces the solution to the system to be well behaved. 

This highlights the vital importance of including Ca ■ With its 
inclusion, destriping can be used to its full potential in returning 
near optimal maps at a fraction of the computing time. 



5 DISCUSSION 

In this paper, we set out to compare the destriping approach to map- 
making to the optimal maximum likelihood estimator. In particu- 
lar, we want to know if destriping is fast and accurate enough to 
replace the maximum likelihood estimator for partial-sky, polarisa- 
tion modulation experiments. 

We have written a new covariant destriping code named 
Descart (DEStriping CARTographer) and a new maximum- 
likelihood estimation code and applied them to non-circular scan- 
ning strategies for the first time. We introduce variations in cross- 
linking through our realistic, but non-ideal, sabre scan and the un- 
realistic, but ideally cross-linked, fence scan. 

Neither of these strategies allow the co-addition of circular 
scans before destriping, a process relied upon by the early "tradi- 
tional" destriping e f forts {B urigana et aljl 19971 . JDelabrouilleJ! 1 9981 
[Maino et al. 1999, Keihanen et al. 2004). We m ust use a "co- 
variant" destriping algorithm ( Keihanen et al.ll2005r) . incorporating 
prior knowledge of the noise power spectrum through Ca. This al- 
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lows the use of very short baselines that produce nearly optimal 
residuals in the output T, Q and U maps. We have shown that, with- 
out this prior, the noise model clearly breaks down pathologically 
at shorter baselines and the returned map becomes junk. 

The simulations we produced applied destriping to different 
noise regimes than those to which it has previously been applied. 
We have probed regimes of both detector noise and atmospheric 
noise, the latter of which represents a much greater challenge as 
it dominates noise even in the current generation of ground-based 
polarisation experiments. Previous work on the efficacy of destrip- 
ing concentrated on the Planck satellite mission, assuming low lev- 
els of correlation in the 1// noise, as is possible in the absence 
of the atmosphere, usin g knee frequencies of 0.03 -Hz for the HFI 
(^Ashdownetal."2007a') and 0.05-Hz for the LFI l Ashdown et"aD 
[2007b). We have simulated detector noise with fknee = 0.1-Hz 
and atmospheric noise with fknee = 0.2-Hz and a spectral index 
Q = 1.9, similar to the level experienced by QuAD tHinderks et al.l 
l2008h . 

Destriping performs better in the more realistic and challeng- 
ing atmospheric noise simulations than in the detector only noise 
simulations. For the shortest baseline of 1-s, it achieves 99.9% of 
the possible improvement in T, Q and U residuals achievable with 
the full maximum-likelihood algorithm. This amounts to reduction 
in T residual magnitude of 90.02% and in Q and U of 55.04%. One 
of the motivators of this study is the requirement to produce high 
quality T maps, in addition to the Q and U maps, in o rder to remove 
T — > P leakage from instrumental polarisation (eg: IJohnson et al.l 
l2007h . which will also leak atmospheric 1// into the Q and U sig- 
nal despite the modulation of those signal by the rotating half- wave 
plate. 

We have found that even without the simulation of this leak- 
age, it is vital to deal with the correlated noise in order to observe 
B-modes: with naive mapping, the significance of the B-mode de- 
tection was destroyed, falling from 8.22cr for the optimal and 1-s 
Descart algorithms to 0.02(t. This is a conclusion specific to the 
single detector case, as the atmospheric noise is completely corre- 
lated between detector pairs that share the same horn. Exploiting 
this correlation, by combining data from both channels, can com- 
pletely remove the atmospheric noise. 

The detector noise simulations maps also saw significant im- 
provement, with the 1-s baseline Descart maps showing 96% of the 
possible improvement in T and 94.71% of the possible improve- 
ment in Q residuals, though the latter amounts to less than 0.1% of 
the noise power in this case. Whilst this is not reducible by detector 
time-stream differencing, the effect on the B-mode noise was very 
small. 

The 1-s Descart maps were returned with improvements in 
computing time of a factor of 6 for both detector and atmospheric 
noise. However, for the atmospheric noise simulations, the 98.73% 
of the improvement in T and 99.9% of Q and U was returned us- 
ing a longer baseline of 10-s, which delivered an improvement in 
speed of 10 x. Most significantly, 89.16% of the improvement in 
Q residuals were achieved with a very long baseline of 1200-s, re- 
turning a speed improvement of 22 x over the maximum likelihood 
algorithm. This can be understood by noting that the atmospheric 
noise has a lot more noise power at low frequencies than the de- 
tector noise simulations do. As the resolution of Ca is increased, 
improvements in the noise are reached more rapidly as the extra 
noise is resolved. 

The destriping algorithm was shown not to distort the CMB 
signal, which is the prohibitive drawback of fast filtering methods. 



such as the C , 



-1/2 



filter, which introduces extra variance into the 



power spectra. Further, destriping is unbiased for B-modes, pro- 
ducing correct null detections for simulations with artificially zero 
B-modes. 

The ideally cross-linked fence scan showed considerably 
quicker reduction in residuals for T with decreasing baseline length, 
than did the sabre scan with detector noise, despite identical noise 
in the two sets of simulations. Improvement in residuals reached a 
trough at 10-s baselines: further improvements were negligible for 
shorter baselines. Q and U residuals improved similarly to the sabre 
scan. 

Covariant destriping produces very close to optimal maps, but 
much more quickly than the full maximum-likelihood algorithm, at 
speeds that make it applicable to the large datasets from upcoming 
B-mode experiments. Destriping can also reduce the vast majority 
of atmospheric noise in single detector time-streams very quickly, 
using more approximate long baselines. 

This paper is the first in a planned series investigating destrip- 
ing for ground based polarisation experiments. The focus of future 
work will be for experiments in which the Q and U signals are sub- 
ject to 1// noise (eg: QUIET). For these experiments, the reduction 
in 1// will be critically important to the science return. The simu- 
lations will be extended to higher resolution and will investigate the 
effects of sub-pixel signal gradients by using CMB input maps at 
much higher resolution than the recovered maps. The simulations 
will also be extended to multiple detectors, with noise correlated 
between detectors sourced both from detector fluctuations and from 
the atmosphere. Destriping will be applied alongside and directly 
compared to the only realistic alternative: the Monte-Carlo based 
MASTER method where the time-stream is high-pass filtered prior 
to map-making ( Hivon et al. 2002) . 
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APPENDIX A: PURE PSEUDO-Cl ESTIMATION 

Spatial information about the comparative map-making residuals 
is gained by analysing the angular power spectrum of the residual 
maps. Estimating B-mode power spectra with partial sky coverage 
is a nontrivial problem, due largely to the presence of ambiguous 
modes ( lBuniill2002l : lGhallinor & Ghorj|200 5) which receive contri- 
butions from both E-mode and B-mode power on the full sky. This 
"_B -+ B mixing" can act as an extra source of noise, for a B- 
mode power spectrum estimator which does not filter out ambigu- 
ous modes. Because B-modes are a primary scientific target for the 
next-generation polarisation experiments considered in this paper, 
it will be important for the analysis pipelines of such experiments to 
use a B-mode power spectrum estimator which does not suffer from 
E —> B mixing. Accordingly, we analyse our residual m aps us- 
ing one such es timator: the pure pseudo-Cf estimator from dSmithl 
l2006l : ISmith"& Zaldarriaga 200' ^, which generalizes the MAST ER 
construction JHivon et alji2002l : lBrown. Gastro & Tavlonl2005l) by 
eliminating _B — > _B mixing from ambiguous modes. In this ap- 
pendix, we briefly summarize its construction and key properties. 

First, in order to apply pure pseudo-Cf power spectrum es- 
timation, one must choose (heuristically) a pixel weight function 
W{n). In this paper we have used cosine apodization: in spherical 
coordinates with north pole at the center of the survey, the weight 
function is given by: 



WiOA) 



< r — Tt 
> r 



(Al) 



where r is the survey radius and r« is an apodization length. In this 
paper, the survey radius is r = 7° and heuristic apodisation radii 
of r, = 4.3° and r, — 1.61° are used for multipole ranges ^ ^ 40 
and ^ > 40 respectively. 

We then define pseudo multipoles 2^, a^„ by: 
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(A3) 



where \l±{n) = (Q ± iU){n ) and 3, 5 are the spin raisin g and 
lowering operators defined in IZaldarriaga. & SeliakI ( 119971) . The 
derivative operators have been placed differenly in the definitions 
ofaf^ , Hfrn above, so that Hfm will receive contributions from am- 
biguous modes, but a£„ will not. Note that n± is a spin (±2) field, 
but the integrands appearing in Eqs. l lA2t . JA3t are spin-zero and 
do not depend on the choice of local frame. 

The next step is to define pseudo power spectra, in bands b, 
by: 



C^ 



Ct 



21 + 



21. 






(A4) 
(A5) 



where Pte is a binning operator (Pte — unless £ G b). 

As constructed, the pseudo spectra are biased estimators of the 
signal power spectra; one has 






K, 






r<EE 

'-'b' 

r<BB 



(A6) 



where the transfer matrices Ki,y depend only on the survey ge- 
ometry and pixel weighting, and can be computed efficiently using 
the algorithm in (Sm ith 2006) . Note that the lower left block in 
Eq. ( IA6t is zero because af^ does not receive contributions from 
ambiguous modes. We do not include an additive term from noise 
bias on the right-hand side of Eq. ( IA6t as this equation is for the 
case of estimating the power spectrum of a pure noise map. 

The final step in the construction is to define unbiased estima- 






K, 



KZu, 

bb' 
jy-+pure 

^bV 



(A7) 



These are unbiased ({(5f ^) = Cj^'^ and (Cf ^) = Cf ^) power 
spectrum estimators, with no E -^ B mixing: the B-mode estima- 
tor has the property that it receives no contributions from ambigu- 
ous modes. 



APPENDIX B: DESTRIPING FOR MULTIPLE 
DETECTORS INCLUDING CORRELATIONS 

The time ordered output from each detector , j/t, is stacked end 
to end to form a single Ntod x Ndctector vector. The vector of 
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offset function amplitudes a^, where a indexes the offset function, 
is similarly stacked end to end to form a Noffsets x Ndetector 
vector, as are the projection matrices Ptp and Fta- 

Each time ordered datum is now indexed by i = It, where I 
denotes a particular detector and t denotes a particular time. Like- 
wise, we index the offsets as j — la. With the new indices, the full 
focal plane TOD vector yi is modelled as 



Vi = PipXp + FijOj + nw,i 



(Bl) 



where Fij Uj models the correlated noise component of yi and nw,i 
is the uncorrelated white noise. 

The likelihood for the system to be minimised is 



2 In L = {yi~ PipXp - Fi^a^ )'^C^^^, 



Fi'pXp tiijiaji) + aj Oa,ji''^j' 



(B2) 



The white noise covariance matrix Cw,ii' is utod indepen- 
dent Ndetector matrices describing the correlation of the white 
noise between detectors. For atmospheric common mode noise, we 
assume that this matrix is diagonal 

a 5ii> 



C- 



W.ii' 



(B3) 



so that white noise is uncorrelated between detectors. 

The offset covariance matrix Ca encodes prior information on 
the correlations in the long term atmospheric 1// between detec- 
tors. Each (Noffset X Noffset) sub-matrix is not diagonal but cir- 
cu lant and can be inverted in the Fourier domain, using the method 
of lPatanchon et all feOOTJ) . 

We define a multi-detector Fourier transform operator F, such 
that the Fourier transforms yk of the TOD from each detector, 
stacked end to end, is 



yk = Fyi 



(B4) 



where the index k — If denotes the Fourier mode / of detector I. F 
is a block diagonal matrix, in which each block is Fourier transform 
operator for a single detector channel. 

The Fourier domain offset covariance matrix is 



Rkk' — FCa,ii'F . 



(B5) 



Each detector-detector sub-matrix of R, which we label Rui 
corresponding to detector combination II', is diagonal and each di- 
agonal element is an independent Fourier mode. Its inverse R^^ is 
easy and quick to compute explicitly, an operation that only needs 
to be accomplished once. 

The inverse of the offset covariance is obtained by 



[C-\,=F^[R-\,F. 



(B6) 



However, there is no need to calculate and store this in the 
time domain, as the operation Ca^a is completed more quickly by 
switching between time and Fourier space 



Ca^a = J'-^[R-^J^[a]] 



(B7) 



Armed with this technique for painlessly inverting the offset 
covariance matrix, we can build the estimator for the offset ampli- 
tudes 



(Fj,i,Zi,iFij + a^C;],.)a, = Fj-i-Zi^ij/, 



(B8) 



where Z^i is block-diagonal in I and each detector's block diagonal 
sub-matrix is given by ( I30t . 
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